AITopics | data security

Collaborating Authors

data security

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Survey on Data Security in Large Language Models

Chen, Kang, Zhou, Xiuze, Lin, Yuanguo, Su, Jinhe, Yu, Yuanhui, Shen, Li, Lin, Fan

arXiv.org Artificial IntelligenceAug-5-2025

Large Language Models (LLMs), now a foundation in advancing natural language processing, power applications such as text generation, machine translation, and conversational systems. Despite their transformative potential, these models inherently rely on massive amounts of training data, often collected from diverse and uncurated sources, which exposes them to serious data security risks. Harmful or malicious data can compromise model behavior, leading to issues such as toxic output, hallucinations, and vulnerabilities to threats such as prompt injection or data poisoning. As LLMs continue to be integrated into critical real-world systems, understanding and addressing these data-centric security risks is imperative to safeguard user trust and system reliability. This survey offers a comprehensive overview of the main data security risks facing LLMs and reviews current defense strategies, including adversarial training, RLHF, and data augmentation. Additionally, we categorize and analyze relevant datasets used for assessing robustness and security across different domains, providing guidance for future research. Finally, we highlight key research directions that focus on secure model updates, explainability-driven defenses, and effective governance frameworks, aiming to promote the safe and responsible development of LLM technology. This work aims to inform researchers, practitioners, and policymakers, driving progress toward data security in LLMs.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.02312

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity

Jing, Pengfei, Tang, Mengyun, Shi, Xiaorong, Zheng, Xing, Nie, Sen, Wu, Shi, Yang, Yong, Luo, Xiapu

arXiv.org Artificial IntelligenceJan-6-2025

Evaluating Large Language Models (LLMs) is crucial for understanding their capabilities and limitations across various applications, including natural language processing and code generation. Existing benchmarks like MMLU, C-Eval, and HumanEval assess general LLM performance but lack focus on specific expert domains such as cybersecurity. Previous attempts to create cybersecurity datasets have faced limitations, including insufficient data volume and a reliance on multiple-choice questions (MCQs). To address these gaps, we propose SecBench, a multi-dimensional benchmarking dataset designed to evaluate LLMs in the cybersecurity domain. SecBench includes questions in various formats (MCQs and short-answer questions (SAQs)), at different capability levels (Knowledge Retention and Logical Reasoning), in multiple languages (Chinese and English), and across various sub-domains. The dataset was constructed by collecting high-quality data from open sources and organizing a Cybersecurity Question Design Contest, resulting in 44,823 MCQs and 3,087 SAQs. Particularly, we used the powerful while cost-effective LLMs to (1). label the data and (2). constructing a grading agent for automatic evaluation of SAQs. Benchmarking results on 16 SOTA LLMs demonstrate the usability of SecBench, which is arguably the largest and most comprehensive benchmark dataset for LLMs in cybersecurity. More information about SecBench can be found at our website, and the dataset can be accessed via the artifact link.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.20787

Country: Asia > China > Hong Kong (0.05)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Trust and Dependability in Blockchain & AI Based MedIoT Applications: Research Challenges and Future Directions

Solaiman, Ellis, Awad, Christa

arXiv.org Artificial IntelligenceJan-5-2025

This paper critically reviews the integration of Artificial Intelligence (AI) and blockchain technologies in the context of Medical Internet of Things (MedIoT) applications, where they collectively promise to revolutionize healthcare delivery. By examining current research, we underscore AI's potential in advancing diagnostics and patient care, alongside blockchain's capacity to bolster data security and patient privacy. We focus particularly on the imperative to cultivate trust and ensure reliability within these systems. Our review highlights innovative solutions for managing healthcare data and challenges such as ensuring scalability, maintaining privacy, and promoting ethical practices within the MedIoT domain. We present a vision for integrating AI-driven insights with blockchain security in healthcare, offering a comprehensive review of current research and future directions. We conclude with a set of identified research gaps and propose that addressing these is crucial for achieving the dependable, secure, and patient -centric MedIoT applications of tomorrow.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.02647

Country: Europe > United Kingdom (0.29)

Genre:

Overview (0.86)
Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(2 more...)

Add feedback

Ensuring superior learning outcomes and data security for authorized learner

Bang, Jeongho, Song, Wooyeong, Shin, Kyujin, Kim, Yong-Su

arXiv.org Machine LearningJan-1-2025

The learner's ability to generate a hypothesis that closely approximates the target function is crucial in machine learning. Achieving this requires sufficient data; however, unauthorized access by an eavesdropping learner can lead to security risks. Thus, it is important to ensure the performance of the "authorized" learner by limiting the quality of the training data accessible to eavesdroppers. Unlike previous studies focusing on encryption or access controls, we provide a theorem to ensure superior learning outcomes exclusively for the authorized learner with quantum label encoding. In this context, we use the probably-approximately-correct (PAC) learning framework and introduce the concept of learning probability to quantitatively assess learner performance. Our theorem allows the condition that, given a training dataset, an authorized learner is guaranteed to achieve a certain quality of learning outcome, while eavesdroppers are not. Notably, this condition can be constructed based only on the authorized-learning-only measurable quantities of the training data, i.e., its size and noise degree. We validate our theoretical proofs and predictions through convolutional neural networks (CNNs) image classification learning.

learner, probability, training data, (15 more...)

arXiv.org Machine Learning

2501.00754

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

MAIDS: Malicious Agent Identification-based Data Security Model for Cloud Environments

Gupta, Kishu, Saxena, Deepika, Gupta, Rishabh, Singh, Ashutosh Kumar

arXiv.org Artificial IntelligenceDec-18-2024

With the vigorous development of cloud computing, most organizations have shifted their data and applications to the cloud environment for storage, computation, and sharing purposes. During storage and data sharing across the participating entities, a malicious agent may gain access to outsourced data from the cloud environment. A malicious agent is an entity that deliberately breaches the data. This information accessed might be misused or revealed to unauthorized parties. Therefore, data protection and prediction of malicious agents have become a demanding task that needs to be addressed appropriately. To deal with this crucial and challenging issue, this paper presents a Malicious Agent Identification-based Data Security (MAIDS) Model which utilizes XGBoost machine learning classification algorithm for securing data allocation and communication among different participating entities in the cloud system. The proposed model explores and computes intended multiple security parameters associated with online data communication or transactions. Correspondingly, a security-focused knowledge database is produced for developing the XGBoost Classifier-based Malicious Agent Prediction (XC-MAP) unit. Unlike the existing approaches, which only identify malicious agents after data leaks, MAIDS proactively identifies malicious agents by examining their eligibility for respective data access. In this way, the model provides a comprehensive solution to safeguard crucial data from both intentional and non-intentional breaches, by granting data to authorized agents only by evaluating the agents behavior and predicting the malicious agent before granting data.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10586-023-04263-9

2412.1449

Country:

Asia > Taiwan > Takao Province > Kaohsiung (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Block MedCare: Advancing healthcare through blockchain integration with AI and IoT

Simonoski, Oliver, Bogatinoska, Dijana Capeska

arXiv.org Artificial IntelligenceDec-3-2024

This research explores the integration of blockchain technology in healthcare, focusing on enhancing the security and efficiency of Electronic Health Record (EHR) management. We propose a novel Ethereum-based system that empowers patients with secure control over their medical data. Our approach addresses key challenges in healthcare blockchain implementation, including scalability, privacy, and regulatory compliance. The system incorporates digital signatures, Role-Based Access Control, and a multi-layered architecture to ensure secure, controlled access. We developed a decentralized application (dApp) with user-friendly interfaces for patients, doctors, and administrators, demonstrating the practical application of our solution. A survey among healthcare professionals and IT experts revealed strong interest in blockchain adoption, while also highlighting concerns about integration costs. The study explores future enhancements, including integration with IoT devices and AI-driven analytics, contributing to the evolution of secure, efficient, and interoperable healthcare systems that leverage cutting-edge technologies for improved patient care.

application, artificial intelligence, international journal, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/ijnsa.2024.16604

2412.02851

Country:

Europe > North Macedonia > Southwestern Statistical Region > Ohrid Municipality > Ohrid (0.05)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.46)
Overview > Innovation (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.90)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Redefining Data-Centric Design: A New Approach with a Domain Model and Core Data Ontology for Computational Systems

Johnson, William, Davis, James, Kelly, Tara

arXiv.org Artificial IntelligenceSep-1-2024

Before this, fragmented computer networks struggled to communicate seamlessly. The introduction of the Transmission Control Protocol/Internet Protocol (TCP/IP) enabled consistent data transfer and became the standard for digital communication. However, this node-centric approach, which relies heavily on Internet Protocol (IP) addresses, has also created significant security vulnerabilities and privacy concerns due to its focus on network nodes rather than the data itself. In today's digital landscape, the centralized aggregation and storage of sensitive user data -- including IP addresses -- by service providers pose substantial security risks. These centralized repositories are prime targets for cyberattacks, potentially compromising user privacy and exposing sensitive information. Additionally, the reliance on IP-based system modeling has amplified these risks, necessitating a shift toward a more secure and resilient design approach. This paper proposes a novel data-centric design methodology that moves away from traditional node-focused models. By prioritizing data as the central entity and incorporating multimodal frameworks encompassing objects, events, concepts, and actions, this approach enhances data security and flexibility. The new informatics domain model reimagines data's role in system design, emphasizing its importance throughout its entire lifecycle to foster innovation, security, and seamless data interoperability.

application example, data-centric model, ontology, (9 more...)

arXiv.org Artificial Intelligence

2409.09058

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud

Wang, Zhepeng, Sheng, Yi, Koirala, Nirajan, Basu, Kanad, Jung, Taeho, Lu, Cheng-Chang, Jiang, Weiwen

arXiv.org Artificial IntelligenceApr-20-2024

Benefiting from cloud computing, today's early-stage quantum computers can be remotely accessed via the cloud services, known as Quantum-as-a-Service (QaaS). However, it poses a high risk of data leakage in quantum machine learning (QML). To run a QML model with QaaS, users need to locally compile their quantum circuits including the subcircuit of data encoding first and then send the compiled circuit to the QaaS provider for execution. If the QaaS provider is untrustworthy, the subcircuit to encode the raw data can be easily stolen. Therefore, we propose a co-design framework for preserving the data security of QML with the QaaS paradigm, namely PristiQ. By introducing an encryption subcircuit with extra secure qubits associated with a user-defined security key, the security of data can be greatly enhanced. And an automatic search algorithm is proposed to optimize the model to maintain its performance on the encrypted quantum data. Experimental results on simulation and the actual IBM quantum computer both prove the ability of PristiQ to provide high security for the quantum data while maintaining the model performance in QML.

pristiq, qubit, secure qubit, (16 more...)

arXiv.org Artificial Intelligence

2404.13475

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.06)
North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Artificial Intelligence enhanced Security Problems in Real-Time Scenario using Blowfish Algorithm

Chinnam, Yuvaraju, Sambana, Bosubabu

arXiv.org Artificial IntelligenceApr-14-2024

In a nutshell, "the cloud" refers to a collection of interconnected computing resources made possible by an extensive, real-time communication network like the internet. Because of its potential to reduce processing costs, the emerging paradigm of cloud computing has recently attracted a large number of academics. The exponential expansion of cloud computing has made the rapid expansion of cloud services very remarkable. Ensuring the security of personal information in today's interconnected world is no easy task. These days, security is really crucial. Models of security that are relevant to cloud computing include confidentiality, authenticity, accessibility, data integrity, and recovery. Using the Hybrid Encryption this study, we cover all the security issues and leaks in cloud infrastructure.

algorithm, cloud computing, encryption, (13 more...)

arXiv.org Artificial Intelligence

2404.09286

Country:

Asia > Singapore (0.04)
Asia > India > Andhra Pradesh (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence (1.00)
(2 more...)

Add feedback

A Forecasting-Based DLP Approach for Data Security

Gupta, Kishu, Kush, Ashwani

arXiv.org Artificial IntelligenceDec-21-2023

Sensitive data leakage is the major growing problem being faced by enterprises in this technical era. Data leakage causes severe threats for organization of data safety which badly affects the reputation of organizations. Data leakage is the flow of sensitive data/information from any data holder to an unauthorized destination. Data leak prevention (DLP) is set of techniques that try to alleviate the threats which may hinder data security. DLP unveils guilty user responsible for data leakage and ensures that user without appropriate permission cannot access sensitive data and also provides protection to sensitive data if sensitive data is shared accidentally. In this paper, data leakage prevention (DLP) model is used to restrict/grant data access permission to user, based on the forecast of their access to data. This study provides a DLP solution using data statistical analysis to forecast the data access possibilities of any user in future based on the access to data in the past. The proposed approach makes use of renowned simple piecewise linear function for learning/training to model. The results show that the proposed DLP approach with high level of precision can correctly classify between users even in cases of extreme data access.

data security, forecasting-based dlp approach, leakage, (12 more...)

arXiv.org Artificial Intelligence

2312.13704

Country:

Asia > Singapore (0.06)
North America > United States > Texas (0.04)
North America > United States > New York > Albany County > Albany (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback